StrAl: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time
نویسندگان
چکیده
MOTIVATION Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem. RESULTS Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below approximately 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.
منابع مشابه
RNA Structural Alignment with Conditional Random Fields
Computationally identifying non-coding RNA regions on the genome has much attention to be investigated. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have a strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for st...
متن کاملAlignment of RNA base pairing probability matrices
MOTIVATION Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are...
متن کاملInduction of apoptosis and necrosis in human acute erythroleukemia cells by inhibition of long non-coding RNA PVT1
Recent advances in molecular medicine have proposed new therapeutic strategies for cancer. One of the molecular research lines for the diagnosis and treatment of cancer is the use of long non-coding RNAs (LncRNAs) which are a class of non-coding RNA molecules longer than 200 base pairs in length that act as the key regulator of gene expression. Different aspects of cellular activities like cell...
متن کاملRNA Base Pairing Probability Alignment by Genetic Algorithm
Sankoff algorithm is one of the most attractive ideas for predicting consensus RNA secondary structure [1]. In the algorithm, RNA sequence alignment problem and RNA folding problem are solved simultaneously by a dynamic programming. However, due to its high computational complexities in both time and space, Sankoff algorithm has not been used to solve practical problems in which usually RNA seq...
متن کاملAlignment of RNA with Structures of Unlimited Complexity
Sequence-structure alignment of RNA with arbitrary secondary structure is Max-SNP-hard. Therefore, the problem of RNA alignment is commonly restricted to nested structure, where dynamic programming yields efficient solutions. However, nested structure cannot model pseudoknots or even more complex structural dependencies. Nevertheless those dependencies are essential and conserved features of ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 22 13 شماره
صفحات -
تاریخ انتشار 2006